254

17

Genomics

An alternative to the above is to create a water-in-oil emulsion from library DNA,

PCR reagents, beads to which the DNA can attach, and oil. Each aqueous globule

should contain one bead with one strand of DNA; because of the random nature of

the mixing that creates the emulsion, only 10–20% of the globules (“microwells”)

fulfil this criterion. Using the usual PCR procedure, the DNA fragments are multiply

copied to create the desired clusters of identical strands. These beads can then be

arranged in an array.

The key to the parallelization is array-based sequencing of the fragments. Early

NGS used pyrosequencing, but this has been superseded by other methods. Ion torrent

sequencing takes place by synthesizing a new, complementary DNA strand one base

at a time; each time a new base is added, a hydrogen ion is released and detected by

a semiconductor pH sensor. Inaccuracy can arise when a sequence of the same base

occurs: depending on the sequence length, it may be uncertain by at least one base. A

more accurate method is “sequencing by ligation” (SOLiD). A primer ofupper NN bases is

hybridized to the adapter, and the DNA is then exposed to a collection of octamers,

each of which has one of four fluorescent dyes at the 5’ end and a hydroxyl group at

the 3’ end. Bases 1 and 2 are complementary to the nucleotides to be sequenced, bases

3–5 are immaterial, and 6–8 are in the inosine bases; phosphorothioate links bases

5 and 6. DNA ligase then joins the octamer to the primer, and the fluorescent dye

is then cleaved using silver ions, generating a 5’-phosphate group that can undergo

further ligation. The dye (corresponding to one of the four bases) is identified, the

extension product is melted off, and a second round of sequencing is undertaken

with a primer ofupper N minus 1N1 bases. Although accurate, this method is limited to short read

lengths. Reversible terminator sequencing (Illumina) has two varieties, 3’-O-blocked

and 3’-unblocked. In the first, the target DNA fixed to a solid support is exposed to

the four bases, each with a different fluorophore attached. After binding, the base is

ligated to the primer, unincorporated nucleotides are washed away, and the support

is imaged to identify the base. The fluorophore is cleaved to regenerate the 3’-OH

termination and the cycle is then repeated. In the second variety only one fluorophore

is used and the target DNA is exposed to each base in sequence.

Third generation sequencing uses single molecules, hence avoiding errors intro-

duced by the PCR and, very importantly, allows much longer length of DNA to be

“read”. The technology continues to evolve increasingly rapidly and fourth genera-

tion methods are emerging. Progress is now being hindered by the enormous amounts

of data being generated by the sequencing technologies. For clinical applications,

accuracy and throughput can be enhanced by constraining sequencing to limited

areas of the genome. If a reference genome is available (as it is in the human case),

the sequence fragments can be mapped onto it, greatly improving the speed and reli-

ability of assembling the complete sequence. For most clinical work, variation from

a canonical sequence is of the greatest interest.